Search CORE

159 research outputs found

An extensive empirical study of collocation extraction methods

Author: Pavel Pecina
Publication venue
Publication date: 01/01/2005
Field of study

This paper presents a status quo of an ongoing research study of collocations – an essential linguistic phenomenon having a wide spectrum of applications in the field of natural language processing. The core of the work is an empirical evaluation of a comprehensive list of automatic collocation extraction methods using precision-recall measures and a proposal of a new approach integrating multiple basic methods and statistical classification. We demonstrate that combining multiple independent techniques leads to a significant performance improvement in comparisonwith individualbasic methods. 1 Introduction an

CiteSeerX

Crossref

An augmented three-pass system combination framework: DCU combination system for WMT 2010

Author: Du Jinhua
Pecina Pavel
Way Andy
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/07/2010
Field of study

This paper describes the augmented threepass system combination framework of the Dublin City University (DCU) MT group for the WMT 2010 system combination task. The basic three-pass framework includes building individual confusion networks (CNs), a super network, and a modified Minimum Bayes-risk (mCon- MBR) decoder. The augmented parts for WMT2010 tasks include 1) a rescoring component which is used to re-rank the N-best lists generated from the individual CNs and the super network, 2) a new hypothesis alignment metric – TERp – that is used to carry out English-targeted hypothesis alignment, and 3) more different backbone-based CNs which are employed to increase the diversity of the mConMBR decoding phase. We took part in the combination tasks of Englishto- Czech and French-to-English. Experimental results show that our proposed combination framework achieved 2.17 absolute points (13.36 relative points) and 1.52 absolute points (5.37 relative points) in terms of BLEU score on English-to- Czech and French-to-English tasks respectively than the best single system. We also achieved better performance on human evaluation

Irish Universities

DCU Online Research Access Service

Towards a user-friendly webservice architecture for statistical machine translation in the PANACEA project

Author: Pecina Pavel
Poch Marc
Toral Antonio
Way Andy
Publication venue
Publication date: 01/01/2011
Field of study

This paper presents a webservice architecture for Statistical Machine Translation aimed at non-technical users. A workﬂow editor allows a user to combine different webservices using a graphical user interface. In the current state of this project, the webservices have been implemented for a range of sentential and sub-sentential aligners. The advantage of a common interface and a common data format allows the user to build workﬂows exchanging different aligners

CiteSeerX

Irish Universities

DCU Online Research Access Service

UPF Digital Repository

Adapting SMT Query Translation Reranker to New Languages in Cross-Lingual Information Retrieval

Author: Pecina Pavel
Saleh Shadi
Publication venue
Publication date: 01/01/2016
Field of study

We investigate adaptation of a supervised machine learning model for reranking of query translations to new languages in the context of cross-lingual information retrieval. The model is trained to rerank multiple translations produced by a statistical machine translation system and optimize retrieval quality. The model features do not depend on the source language and thus allow the model to be trained on query translations coming from multiple languages. In this paper, we explore how this affects the final retrieval quality. The experiments are conducted on medical-domain test collection in English and multilingual queries (in Czech, German, French) from the CLEF eHealth Lab series 2013--2015. We adapt our method to allow reranking of query translations for four new languages (Spanish, Hungarian, Polish, Swedish). The baseline approach, where a single model is trained for each source language on query translations from that language, is compared with a model co-trained on translations from the three original languages

Biblio at Institute of Formal and Applied Linguistics

Adaptation of Machine Translation to Specific Domains and Applications

Author: Pecina Pavel
Publication venue
Publication date: 08/01/2018
Field of study

Matematicko-fyzikální fakult

CU Digital Repository

Towards using web-crawled data for domain adaptation in statistical machine translation

Author: Giagkou Maria
Papavassiliou Vassilis
Pecina Pavel
Prokopidis Prokopis
Toral Antonio
Way Andy
Publication venue
Publication date: 30/05/2011
Field of study

This paper reports on the ongoing work focused on domain adaptation of statistical machine translation using domain-speciﬁc data obtained by domain-focused web crawling. We present a strategy for crawling monolingual and parallel data and their exploitation for testing, language modelling, and system tuning in a phrase--based machine translation framework. The proposed approach is evaluated on the domains of Natural Environment and Labour Legislation and two language pairs: English–French and English–Greek

DCU Online Research Access Service

MTMonkey: A Scalable Infrastructure for a Machine Translation Web Service

Author: Dušek Ondřej
Pecina Pavel
Rosa Rudolf
Tamchyna Aleš
Publication venue
Publication date: 01/01/2013
Field of study

We present a web service which handles and distributes JSON-encoded HTTP requests for machine translation (MT) among multiple machines running an MT system, including text pre- and post processing. It is currently used to provide MT between several languages for cross-lingual information retrieval in the Khresmoi project. The software consists of an application server and remote workers which handle text processing and communicate translation requests to MT systems. The communication between the application server and the workers is based on the XML-RPC protocol. We present the overall design of the software and test results which document speed and scalability of our solution. Our software is licensed under the Apache 2.0 licence and is available for download from the Lindat-Clarin repository and Github

CiteSeerX

Biblio at Institute of Formal and Applied Linguistics

CUNI System for WMT16 Automatic Post-Editing and Multimodal Translation Tasks

Author: Bojar Ondřej
Helcl Jindřich
Libovický Jindřich
Pecina Pavel
Tlustý Marek
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2016
Field of study

Neural sequence to sequence learning recently became a very promising paradigm in machine translation, achieving competitive results with statistical phrase-based systems. In this system description paper, we attempt to utilize several recently published methods used for neural sequential learning in order to build systems for WMT 2016 shared tasks of Automatic Post-Editing and Multimodal Machine Translation.Comment: Accepted to the First Conference of Machine Translation (WMT16

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Publikationsserver der RWTH Aachen University

Biblio at Institute of Formal and Applied Linguistics